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j>^ '• 1 Introduction 



The standard derivation of the laws of motion from the principle of least action 
is a cornerstone of all major physical theories. However, this derivation is 
based on a number of postulates which are too unnatural to be considered 
as axioms. For instance, why at the fundamental level does the Lagrangian 
L of a composite system always have the form L = L^ + L"^ — V, where 
L^ and L^ are the Lagrangians of free subsystems and V accounts for the 
interaction? Indeed, what meaning can we assign to the difference between 
kinetic and potential energy, which is used as a template for many physical 
Lagrangians? Furthermore, why is the action defined as an integral of L, and 
why do we obtain correct classical equations by minimizing this integral? These 
questions are equally important in almost every theory which uses Lagrangian 
formulation. We may therefore hope that the answers to these questions can 
be useful in finding the ultimate physical theory. 

*e-niail: a.soklakov@rhul.ac.uk 
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According to Occam's Razor, simple theories are more economical and are 
usually better suited for making predictions. Indeed, all fundamental laws of 
physics are surprisingly simple in form. In this paper we introduce Occam's 
Razor in a form of a physical principle that we call the simplicity principle (SP). 
Using the SP we answer the above questions: we explain the structure of the 
Lagrangian of a composite physical system together with the other postulates 
behind the Hamilton's principle of stationary action. In this sense we derive 
the Hamilton's principle of stationary action. 

As a first step, we introduce the standard notion of the state space of a me- 
chanical system and derive Newton's second law from just two extra postulates, 
namely the SP and the Galilean relativity principle. The purpose of this deriva- 
tion is to demonstrate our approach using the particularly well known case of 
Newtonian mechanics. The important contribution from our approach is that 
the mathematical structure of the SP alone implies that all fundamental inter- 
actions can be accounted for by adding an extra term to the Lagrangian. This 
result is independent of the particular theory. In other words, different theo- 
ries correspond to different Lagrangians for free elementary systems, whereas 
the SP tells us how to introduce interactions between them. This means, for 
instance, that in applying our theory for the relativistic case it is enough to 
consider one particle cases, such as a free relativistic particle and a relativistic 
particle in an external gravitational field. The rest of the arguments follow 
simply by replacing Galilean relativity with Einstein's principle of relativity 
and his principle of equivalence. 



2 Dynamical laws and the Simplicity Principle 

The task of theoretical physics is to find algorithms that can correctly repro- 
duce or predict experimental data. However, not every such algorithm can 
be considered satisfactory, as real understanding implies that a minimal set 
of simple axioms is found and all experimental results can be reproduced as a 
consequence of these axioms. The axioms should introduce a state model able 
to describe the system instantaneously, and the dynamical laws that describe 
any physical changes of the system's state. Since the set of evolution histories 
is incomparably more numerous than the set of system states, the complexity 
of the system dynamics given the system state model can be very large. In this 
context it is not trivial that all fundamental laws of physics should be simple 
in form, yet this is true for all known fundamental laws, even ones as diverse 
as gravitation and quantum mechanics. In this paper we propose to use this 
fact as a common ground for the laws of physics. 
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Given a physical system, consider the set S = {^} of all states in which the 
system can be prepared or experimentally found. We thereby require that each 
^ G S contains a complete description of a system state. This means that there 
can be no hidden information (such as the preparation history) that would 
distinguish otherwise identical system states. A function /:§— >-Sx§x§x--- 
is called a dynamical law if, for any initial state ^o ^ S, the value /(^o) is an 
ordered sequence {^1,^2, • • • }, ^fc ^ S. Physically, /(^o) defines a trajectory in 
S associated with the initial state ^o- 

By the definition of §, we assign the same physical meaning to a system state re- 
gardless of its preparation history. In other words, if the system passes through 
an intermediate state ^i then the predicted evolution following ^j should not 
depend on how the system reached ^j. For each dynamical law / that satisfies 
this requirement we can find a function g such that for every two consecutive 
points ^i and ^j+i in the trajectory /(^o) we have ^i+i = g{^i). For further 
reference we will call such laws Markovian. It is clear that without loss of gen- 
erality we can consider only Markovian laws. This is because the preparation 
history can always be included in the description of the system state, in which 
case the the evolution will be Markovian. 

Not all dynamical laws describe the actual system evolution equally well. Con- 
sidering the set of all possible dynamics as a hypothesis space, we may follow 
one of the standard approaches in the formal induction theory []I[. For instance, 
we can try to find a single law which, by some criterion, is better than any 
other law. Alternatively, we can try to use the individual predictions of each 
possible law and formulate our final prediction by averaging over all individual 
predictions using some "prior" probability distribution. In this paper we fol- 
low the strategy of singling out only one dynamical law and leave the second, 
more general approach, for the discussion of further research on quantization 
(section |]). 

In order to discriminate between different dynamical laws, we postulate that 
the most economical set of axioms for a physical theory includes the simplicity 
principle (SP): among all dynam,ical laws that are consistent with all the other 
axioms, the laws with the smallest descriptional complexity predominate the 
system 's behavior. The SP has philosophical and historical roots in Occam's 
Razor, which was stated by Isaac Newton as Rule I for natural philosophy in his 
famous Principia. Relatively recently, Occam's Razor became a cornerstone of 
the modern theory of induction and computational learning as introduced by 
Solomonoff in 1964 (see also Ref. for a thorough review including more 
recent developments). 

In contrast to computational learning, we do not analyze a collection of raw 
experimental data. Using the examples of Newtonian and relativistic mechan- 
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ics, we demonstrate that only the most general axioms (such as the Galilean 
relativity principle or Einstein's relativity principles) are sufficient to com- 
plete the theory if combined with the SP. Physics enters our formalism both 
through the definition of the "state space" § of the system and through the 
relativity principles; these are taken as experimental facts. The SP provides an 
inference tool for finding the simplest dynamical theory consistent with these 
experimental facts. 

The main weakness of this paper is a rather artificial proof that there is 
no contradiction in using Kolmogorov complexity to quantify the complex- 
ity of dynamical laws in the case of Newtonian mechanics. The requirement 
of Galilean relativity appears as a constraint on the complexity of physical 
dynamics. Even though this constraint is mathematically consistent and can 
be satisfied, it appears to be rather artificial in the framework of algorithmic 
information theory. This does not occur in the relativistic case which natu- 
rally follows from an absolutely analogous yet technically simpler analysis. We 
suspect that the difficulties in the Newtonian case arise from the special role 
of time, although a more suitable measure of complexity of dynamical laws 
probably can be proposed. 



3 Mathematical background and key ideas 

To formulate the SP mathematically we need a measure of complexity which 
can be assigned to individual objects (as we need to discriminate between 
particular laws). In 1963-1965, A. N. Kolmogorov (see e.g. [Q) proposed to 
consider this problem in the framework of the general theory of algorithms. 
Similar results were obtained by R. J. Solomonoff [^, 0], and by G. J. Chaitin 
0; these three authors had different motivations and worked independently 
from one another |jl]]. Significant progress has been made to improve the origi- 
nal definitions of complexity so as to increase the range of applications. For the 
purpose of this paper we will use the prefix version of Kolmogorov complexity 
which was introduced by Levin |^, Gacs [Q and Chaitin 0. 

All the key properties of prefix complexity that are necessary for our results 
are summarized by Eq. (^). This is important because Eq. (^ represents a 
very typical property of information in both algorithmic and probabilistic ap- 
proaches. This property is often illustrated by Venn diagrams which make 
Eq. (^ a very natural requirement for any information-theoretic measure of 
complexity. In physics, we often deal with integrable, differentiable and even 
smooth functions. Kolmogorov complexity can be interpolated by such func- 
tions only in special cases, and under some severe restrictions on its arguments. 
This fact makes it difficult to work with Kolmogorov complexity even though 
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Eq. (P) is all we really need at this stage. We therefore acknowledge that 
by using an alternative measure of complexity which obeys Eq. (^ we may 
considerably simplify the arguments of this paper. In the first reading, we 
recommend noting property (^ and proceeding with subsection pl3| , skipping 
the following subsection. 



3.1 Prefix Kolmogorov Complexity 

In this subsection we review the definition and some important properties of 
the prefix complexity. Let X = {A, 0, 1, 00, 01, 10, 11, 000, . . . } be the set of 
finite binary strings where A is the string of length 0. Any subset of X is called 
a code. Any string in a code has a well defined length and the set of string 
lengths is an important characteristic of the code. An instantaneous code is 
a set of strings Y C X with the property that no string in Y is a prefix of 
another. A prefix computer is a partial recursive function]^ C : Y x X ^ X. 
For each j9 G Y (program string) and for each d & IL (data string) the output 
of the computation is either undefined or given by C{p, d) G X. Following the 
usual motivation [|I|, we restrict our attention to prefix computers. This is a 
very weak restriction in the sense that every uniquely decodable code can be 
replaced by an instantaneous code without changing the set of string lengths 
|jl|. Consider a mathematical object that has a binary string a as its complete 
description. The idea is to choose some reference computer C, find the shortest 
program that makes C compute a given data d, and use the length Kc{o.\d) 
of the program (in bits) as the measure of the object's complexity. Formally, 
the complexity of a given data d relative to computer C is 

Kc{a\d) = m:in{\p\ \C{p,d) = a], (1) 

p 

where \p\ denotes the length of the program p (in bits). 

Since this complexity measure depends strongly on the reference computer, it 
is important to find an optimal computer U for which Ku{a\d) < Kc{<y\d) + K,c 
for any prefix computer C and for all a and d, where kc is a constant depending 
on C (and U) but not on a or d. It turns out that the set of prefix computers 
contains such a U and, moreover, it can be constructed so that any prefix 
computer can be simulated by U: for further details consult fl]]. Such a U is 
called a universal prefix computer and its choice is not unique. Using some 
particular universal prefix computer U as a reference, we define the conditional 
Kolmogorov complexity of a given /5 as Kif{a\l3). 

^A partial function which is computed by a Turing machine. 
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The above definitions are generalized for tlie case of many strings as fol- 
lows. We choose and fix a particular recursive bijection i? : X x X — > X. 
Let {a*}"^;^ be a set of n strings a* G X. For 2 < k < n we define 
(a^, a^, . . . , a^) = B{{a^, . . . , a^^^), a^), and {a^) = a^. We can now define 
Ku{a\...,a-\P\...,P^)^Ku{{a\...,a-m\...,P^)). 

For any two universal prefix computers f/i and f/2 we have, by definition, 
\Ku-^{a\[3) — Ku^{a\l3)\ < n{Ui, U2) where k([/i, U2) is a constant that depends 
only on Ui and ?72 and not on a or j3. In many standard applications of Kol- 
mogorov complexity the set of reference computers is considered to be finite 
and the attention is focused on complex objects such as random or nearly ran- 
dom long strings. In such cases, Kolmogorov complexity becomes an asymp- 
totically absolute measure of the complexity of individual strings: the constant 
n{Ui, U2) can be neglected in comparison to the value of the complexity. For 
this reason, many fundamental properties of Kolmogorov complexity are estab- 
lished up to an error term which can be neglected compared to the complexity 
of the considered strings. For instance, the standard analysis of the prefix 
Kolmogorov complexity ([]1[, Section 3.9.2) gives 

Ku{a, 7|/?) = Ku{a\^, P) + Ku{i\P) + A , (2) 

where A is the error term which grows logarithmically with the complexity of 
considered strings. In our case such accuracy is unacceptable as we want to 
use Kjj to analyze simple dynamical laws for which the complexity is small and 
terms like A cannot be neglected. Fortunately, in the case of simple strings 
(see Definition 1 in Ref. [l^) this problem can be solved by a natural restric- 
tion of reference computers |T0[. Roughly speaking, this restriction entails the 
requirement that switching to a more complex reference computer should al- 
ways be accompanied by an equivalent reduction of program lengths, i.e. more 
complex computers are required to be more "powerful". Denoting a set of 
computers which satisfies this requirement by {Ws\ we then construct a com- 
puter W which is universal for this set by setting W{p, {s, d)) = Ws{p, d) and 
use any such W a.s a. reference. By a slight abuse of notation, for any simple 
pair of strings (0,7), we have by Theorem 1 in Ref. |T0| : 



Kw{a, 7|/5) = Kw{a\-^, 13) + Kw{l\P) + const , (3) 

where the constant depends only on the reference machine W (not on a, (3 
or 7). 

It is important to keep in mind that Kolmogorov complexity becomes a par- 
ticular function only if the reference computer is given. A set of reference 
computers defines a set of complexity functions which have some properties in 
common, e.g. Eq. (|^), but nevertheless, individual complexity functions can 
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look very different from one another. In order to verify whetlier any particular 
function G : X — >■ N is a Kolmogorov measure of complexity it is necessary 
and sufficient to find a reference computer W such that Kw = G. The rest of 
this subsection deals with the properties that are common to all complexity 
functions defined by the set of reference computers {W}. 

For any particular reference computer W, we can simplify the notation K = 
Kw and use @ to show that 

K{a\...,a^\d) = K{a^\d) + K{a\ ... ,a^-^\a^ ,d) + const 

N-l 

= K{a^\d) + J2 K{a^-''\a^-''+\ ...,a^,d) + const 

n=l 

N-l 

= K{a^\d) + J2K{a''\a''+\...,a^,d) + const . (4) 



n=l 

,1 ^,2 ^,N 



Defining the conditional mutual information of objects a and 7 , 7 , . . . , 7 as 

/(a : 7\ . . . , 7^M) = K{a\d) - K{a\^\ . • . , 7^, c?) (5) 

(consult Ref. ^) we have 



N N-l 



K{a\ ...,a^\d) = Y^ Kia^'ld) - ^ /(a" : a"+\ . . . , a^\d) + const. (6) 



ra=l n=l 



This equation will soon become important for the complexity analysis of dy- 
namical laws. 



3.2 Complexity of dynamical laws 

Recall that a dynamical law was defined earlier as a function on the state 
space of the system. If we had a definition of the complexity of a function we 
would therefore be able to quantify the complexity of a dynamical law. For 
any function -^Z : X ^ Xi x ■ ■ • x X j (Xj = X) , the complexity of "^f at Xq G X 
is defined as 

1 ■^"' 

K,yf]^-Y^K{xk^^\xk), (7) 

fc=0 

where the ordered sequence {xi, X2, . . . ,xj} = '^/(xo). It is helpful to illustrate 
this definition for the case of Markovian laws as defined in section S. If "^f 
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is Markovian, then by definition there exists a function g : H ^ H such that 
Xk+i = g{xk) and we have 



1 ^-^ 



J 

k=0 



The complexity of '^f is therefore equal to the complexity of (7 at a typical 
point in the trajectory {xq, . . . ,xj}. In other words, the complexity of '^f at 
Xq quantifies the amount of information needed to compute a typical step of 
the trajectory generated by •{/ from the initial condition xq. 



3.3 Key ideas 

Before we introduce our derivation of Newtonian mechanics, it is relevant to 
recall the definition of a Newtonian mechanical system and highlight the con- 
ceptual difficulties of the standard approach. A Newtonian mechanical system 
consists of particles whose dimensions can be neglected in describing their mo- 
tion. The position of a particle in space is defined by its Cartesian coordinates 
r = {x,y,z). The derivative r = {x,y,z) = {dx/dt,dy/dt,dz/dt) of the coor- 
dinates with respect to time t is called the Cartesian velocity of the particle. 
The physical state of the system is completely determined if the coordinates 
and the velocities are determined for every particle in the system. For every 
mechanical system one can write a function of its state that together with an 
appropriate dynamical principle defines the system evolution. This function 
is called the Lagrangian of the system and is usually postulated, except for 
a few special cases where it can be derived. For example, one can show that 
the Lagrangian of a single free particle is proportional to its squared velocity. 
This fact is a direct consequence of the Galilean principle of relativity and 
the classical definitions of homogeneous isotropic space and homogeneous time 
(Ref. |TI[], §3,4). Unfortunately, this is about all one can explain using the 
standard approach. Certainly, we have no satisfactory explanation of why the 
Lagrangian of a mechanical system has the accepted form L = T — V and why 
we minimize a functional of "action" which is an integral of L along a short 
segment of a path. Indeed, the standard derivation of the equations of motion 
from the principle of stationary action uses Newton's second law as an estab- 
lished fact [|l^. An analogous situation is found in the standard derivation of 
the equations of wave mechanics in the Lagrangian formulation []T3|. It may 



seem that the Lagrangian formulation of the laws of dynamics is merely one 
way of writing them down. Nevertheless, the Lagrangian formulation plays 
an important role in understanding the physical world in many areas due to 
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its truly remarkable ability of unifying various types of interactions. Quoting 



R. P. Feynman [H "We regard the action to be the more fundamental quan- 
tity. From it we can immediately read off the rules for the propagators, the 
coupling, and the equations of motion. But we still do not know the reasorf^ 
for the rules for the diagrams, or why we can get the propagators out of S [the 
action]". Whatever the physical interaction, if it is well defined and under- 
stood, it is often enough to add one extra term in the Lagrangian to describe 
it. 

Adding interaction terms to free Lagrangians is a rather specific way of in- 
troducing interactions. In conservative nonrelativistic mechanics, for example, 
interactions are typically considered as functions of the relative positions of 
the interacting subsystems (Ref. [ill, §5). Lagrangians of free subsystems are 
functions of a different type: they can only depend on the absolute states 
describing each subsystem individually. The total Lagrangian, including inter- 
action, is constructed as a difference between free Lagrangians and interaction 
terms. Not every function of the combined system state can be represented in 
this way. 

We suggest that property (P) of complexity measures may provide an expla- 
nation for the general structure of the Lagrangian for a composite physical 
system. Considering, for instance, a pair of strings a^ and a^, we have from 
(^) that the complexity of them both given any data d is given by 

K{a\a^\d) = K{a^\d) + K{a^\d) - I{a^ : a V) + const , (9) 

where the first two terms represent the complexities of a^ and a^ considered 
independently from one another. The third term I{a^ : a'^\d) quantifies the 
strength of correlation between the two strings which can be viewed as an 
amount of information in one string about the other, given initial knowledge 
d. A typical action of a composite physical system has the same structure. For 
the action S^'^ of a bipartite system we would normally write 

Si'2 = s^ + S^ - S'''' + const , (10) 

where S^ and S^ are the actions for individual subsystems, S™* is the interaction 



term, and the constant can be arbitrary. The similarity between (^ and (IC) 
becomes even more apparent if we think of correlation between two strings 
a^ and a^ as a manifestation of interaction. To be more precise, we can 
use the strings a^ and a^ to describe dynamical laws g^ and g^ of individual 
subsystems by setting a'' = g^{d). Using (|D and the definition of complexity 
of a dynamical law (section |3.2|) , we have 

K4g\g^] = K^g'] + Ka[g'] - h[g^ : g^] + const , (11) 



^Italics introduced by this author and not in the original text. 
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where Id[g^ '■ g^] = /(ct^ : a'^\d) quantifies the strength of correlation between 
the two dynamical laws governing the interacting subsystems. We draw special 
attention to the fact that the property given by Eq. @ is rather typical for 
information-theoretic measures of complexity and can be easily understood 
using Venn diagrams. We can therefore anticipate that the structure of action 
for a composite physical system ([lOD can be understood as a consequence of a 
more general property, namely the structure of complexity of dynamical laws 
governing the behaviour of the system. 

In this article, we develop the proposed approach considering a particular mea- 
sure of complexity, namely Kolmogorov complexity. Our choice of this measure 
is based on the fact that Kolmogorov complexity was specifically designed for 
quantification of the complexity of individual objects, as opposed to alterna- 
tive probabilistic approaches. The price we pay is a rather difficult or, more 
likely, unusual mathematical formalism. Kolmogorov complexity is defined 
with respect to a "reference computer"; it is an essentially discrete quantity; 
and there is no algorithm which can compute this quantity in the most general 
case. It is hard to imagine a more difficult quantity in the realm of physics 
where we are used to integrable, differentiable or even smooth functions. 

We have already mentioned that the Galilean relativity principle plays a key 
role in the derivation of the Lagrangian of a single nonrelativistic particle but 
alone it is not enough to explain all the postulates of the Hamilton's principle 
of least action. Using a simple example of a conservative nonrelativistic system 
we show that the Galilean relativity principle can be combined with the SP to 
answer the questions posed at the beginning of this article. The structure of 
the complexity of the dynamical laws given by Eq. (|TTp explains the general 
structure of the Lagrangian of a composite system; the integral in the defini- 
tion of the action corresponds to the sum in Eq. (H); and the minimization 
procedure corresponds to finding the simplest dynamical law (consistent with 
the Galilean relativity). These arguments only deal with the structure of the 
Hamilton's principle and can be applied beyond nonrelativistic mechanics. In 
the relativistic case, for instance, the same arguments apply, the only differ- 
ence being that the Galilean relativity principle is replaced with the relativistic 
principles of Einstein. 

It can already be seen from this introduction that the invariance of action asso- 
ciated with a particular relativity principle must be satisfied by the measure of 
complexity of the dynamical laws. This is the point where the technical diffi- 
culties associated with the use of Kolmogorov complexity appear. In principle, 
by choosing an appropriate reference computer, it is possible to set the Kol- 
mogorov complexity to any function at any finite set of strings. The problem 
is that it can be difficult to construct a "natural" example of such a computer. 
The good news is that the relativity principles have nothing to do with the 
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properties of action like the structure (|10|). This means that in order to check 
the consistency of the SP with a particular relativity principle it is sufficient 
to consider only the free particle case. 

In our derivation of Newton's second law we construct a rather artificial ex- 
ample of a reference computer which satisfies all the constraints imposed by 
the Galilean relativity principle on the complexity of physical laws. It is inter- 
esting, however, that the complications encountered in the case of Newtonian 
mechanics disappear in the relativistic case, even though the arguments are 
absolutely analogous. Mathematically, this is due to the fact that the square 
of the four- velocity of a relativistic particle is always equal to one, whereas the 
squared velocity of a nonrelativistic particle, which appears in the Lagrangian, 
depends on the reference frame. It is tempting to assume that the special 
role of time in Newtonian mechanics is to blame for the complications. At 
the moment, however, there is no evidence that the reference computer in the 
nonrelativistic case cannot be constructed in a more elegant way. 

In conclusion of this section, it is important to emphasize that, in this article, 
we use only a small fraction of Kolmogorov complexity calculus. Kolmogorov 
complexity is rich in properties which can be useful in fundamental physics. 
As a simple example, consider the Kraft inequality which demands that the 
sums of the type ^„2~^^[-'^l are convergent. The proposed analogy between 
Kolmogorov complexity and action suggests that the Kraft inequality may be 
useful in the context of the path integral approach (see section ^ for some 
details). 



4 Main derivations 

Given a physical system, consider the set § of all possible states of the system. 
For any initial state ^o ^ ^ and for any dynamical law / the entire system 
evolution is given by the trajectory /(^o) = {^i, ^2, . . . } in the "state space" S 
of the system. This definition is general enough in that a continuous trajectory 
can be defined as a sequence of points ^s, where s is a continuous parameter. 
In the case of a composite system, one can also introduce state spaces and 
dynamical laws for every subsystem. In general, these dynamical laws are not 
independent but correlated due to physical interaction between the subsystems. 
One way to study that correlation in detail is the complexity analysis based 
on the earlier defined notion of Kolmogorov complexity of a function. In this 
case a coarse-graining of the state space is often necessary as we need finite 
binary strings C,k to address the points C,k in S which is often a continuum. 
It is convenient to identify binary strings with the coarse-grained numerical 
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values they represent. One can also assume, without loss of generality, that 
the coarse-graining of the state space can be performed as fine as necessary at 
the cost of increasing the length of the binary strings used. 

Let a sequence of finite binary strings ^q, ^i, . . . , O represent some J + 1 points 
of a coarse-grained trajectory in the state space. A function ■^f : S ^ E>i x 
■ ■ ■ X §j (Sj = S) of the collective variable ^ defines a coarse-grained dynamical 
law if 

''/(eo) = te,6,...,6}- (12) 

In the standard approach we consider a set of continuous differentiable trajec- 
tories with fixed initial and final conditions. Such trajectories can obviously 
be approximated to any degree of accuracy by the "^/-type of functions. To get 
a better approximation all we need to do is to increase the length of binary 
strings, keep ^o and ^j fixed while adding more points in between. Formally 
we write 

/= lim lim -^Z , (13) 

where the limit indicates that the continuous trajectory /(^o) has an infi- 
nite number of points (J -^ oo) each specified to an infinite accuracy (|^| = 
^fc \^k\ —>■ oo). In any situation when we are interested in continuous laws, 
we would normally start from the continuous trajectory /(^o) so there is no 
ambiguity in the definition of the limit. 

According to the definition of complexity of a function (subsection pTl| ), the 
complexity of a coarse-grained dynamical law "^/ is given by 

fc=0 

The SP can now be formulated as a variational problem of minimizing K^^ [^f] 
over all dynamical laws {'^f} that are consistent with all other physical ax- 
ioms. It is important to acknowledge that the SP is just an inference tool 
and additional axioms are needed for finding physical dynamics. It is for this 
reason that we cannot narrow the set of dynamical laws down to the trivial 
law (Tfc+i, rfc_|_i) = (T'fc,rfc), which is intuitively the simplest. In Newtonian 
mechanics, for instance, the minimal set of axioms includes the SP and the 
Galilean relativity principle. The trivial law would violate the Galilean rela- 
tivity principle as a particle cannot be at rest in more than one reference frame. 
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4.1 Newtonian Mechanics 

Consider the state space of a Newtonian mechanical system where time, po- 
sitions and velocities of the particles are combined into a collective variable^ 
^ = (r, f, t) to represent every possible state of the system |15|. Each point ^k 
of a coarse-grained trajectory approximates a point ^k = {fk, ^k, tk) in the state 
space. We can therefore define ^k = {'r'k,rk,ik) where r^, r^ and ik are finite 
binary strings approximating the values of r^, f^ and tk in the real-parameter 
state space. 

We shall follow the standard approach and choose the time t = ik instead of 
an abstract parameter k to define the order of events {C,k} in the trajectory 
■^/(^o)- This reflects the absolute nature of time in Newtonian mechanics and 
is not a necessary requirement of our approach. We put all elements of the 
set {t} in order such that for any two consecutive times t and t + At we have 
At > 0. For any function of time 0j we define A0t = 4>t — 4>t~At and the 
discrete time derivative is defined as 

Aft rt - rt-At ,-. r:^ 

"*"Ar = ^A^- ^''^ 

We see that if |^| = ^^. l^^l is taken to infinity one can chose the lengths of 
{fk} and {ik} such that rj approaches f*. This fact is used every time we 
approximate derivatives by ratios of finite differences on a computer. Formally 
we write 

Vt = lini lim Vt , (16) 

At^Olfl^oo 



where, as in the case of Eq. ([13|), the double limit means that, in practice, 
sufficiently long binary strings (|^| -^ oo) should be used for any finite At 
(i.e., longer strings are needed for better precision). Equation (|T6D suggests 
that the above definition of r^ can be used for construction of ^t = {f't,'^t,t) 
which is by definition a coarse-grained approximation of C,t = {Tt-,'f't-,t). The 
second discrete time derivative is, by analogy, defined as 

Art ri-ri_At ,.^. 

In the new parameterization Eq. ([I^ becomes 

t=to 



^Here we omit the subscript enumerating the particles for the sake of convenience of 
notation. 
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where the sum over t goes through the set {tfc}fc=d; we choose for simphcity 
t = ik = kAt, where At is now a constant. Like any other re-parameterization, 
this relation can be absorbed into the definition of the reference computer in 
the form of a subroutine which is always executed to calculate t + At given t. 
Equation (|18D becomes 

1 ''-' 
K^oi'f] = f ^y]i^(n+At,n+A,|f«„r„t) At . (19) 

It is also convenient to absorb the definitions of discrete time derivatives, 
Eqs. ([T5|) and (0), into the definition of the reference computer. In this 
case r-t-t-At and r^+At can be set for automatic evaluation from Vt+At given r^ 
and Vt. We therefore have 

f.7-1 

Kd'f] = T ^$^/^(rVA*|n,n,t) At . (20) 

Let us fix the time r = tj — to in which we investigate the system evolution. 
We have 



r-At 

^ioi'f] = - E KCrt+At\ft,rt,t)At , (21) 

''" t=o 

where the sum over t goes from to r — At in steps of At. From this equation 
we can already see that the complexity of dynamical laws is determined by the 
complexity of the acceleration Vt+At given the system state ^t = {'i't,ft,t) in 
the immediately preceding past. 

Consider the simple example of a single free particle which at the instants 
and r is in the states ^o and ^r respectively. The physical dynamics which 
are to be found by minimizing K^^ Yf] over all ^f with fixed ^o and ^^ should 
satisfy the Galilean relativity principle. This restricts the set of dynamics {"'/} 
to those for which 



K{Tt^At\Tu ru i) = ^ + -^^ ' (22) 

where ?7i is a positive coefficient and Q{C,t) is an arbitrary function of the state 
of the system. Before we proceed with the proof of this, it is relevant to recall 
the Galilean relativity principle. 

The Galilean relativity principle is based on the notion of inertial reference 
frames in which, by definition, the laws of mechanics take their simplest form 
(Ref. |]ll|, §3). Mathematically, inertial reference frames are defined through 
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a number of properties which are known commonly as Newton's First law. 
These properties imply homogeneity and isotropy of space and homogeneity 
of time. Moreover, in the inertial reference frame where a free body is at rest 
at some instant it remains always at rest. And finally, the coordinates r and 
r' of a given point in two different inertial frames are related by the Galilean 
transform 

r = r' + vt, (23) 

where it is understood that time is the same in the two frames (t = t') and 
that the second frame moves relative to the first one with velocity v. It follows 
directly from these definitions that velocity of a free particle is constant in any 
inertial frame, i.e. rj+At = 0. 

The proof of equation (P2| ) is constructed as follows. Suppose we want to inter- 
polate K{rt+^t\ft, rt, t) by some well behaved function Li which is defined on 

real numbers, but coincides with K{rt+At\rt,rt,t) on the coarse-grained tra- 
jectory where K is defined. Assuming that such an Li exists, we determine its 
properties and prove that on the coarse-grained state space it would behave as 
suggested by Eq. (^2]) . We then show that there exist infinitely many reference 
computers for which K is consistent with Eq. (|2^. In such cases the inter- 
polation Li of K can be found as assumed, because it is uniquely (up to the 
total time derivative) defined by Eq. (^). In summary, we demonstrate that 
by choosing an appropriate reference computer W, the complexity measure K 
can be made to satisfy all the requirements imposed by the Galilean relativity 
principle in the framework of the SP. In this argument we use the standard for- 
mulation of the Galilean relativity principle on the continuum. Alternatively, 
we could reformulate the Galilean relativity principle in a discrete form and 
try to apply it directly to K without introducing Li. This will be considered 
in future research. However for now, we shall keep the standard formulation 
of the Galilean relativity principle on the continuum, and demonstrate the 
relation between the continuous and the discrete formulations of Newtonian 
mechanics. Later we will see that in the relativistic case all the arguments can 
be performed without introducing Li. 



For our case of a single free particle we have r^+Ai = 0, and therefore Li is a 
function of only two vector and one scalar arguments (coordinates r^, velocity 
Vt and time t) . Substituting such an Li instead of K into Eq. (^T]) we have 



-At 



''" t=o 
We require integrability of Li on M, in which case the complexity of continuous 
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dynamics ([13|) can be quantified by 



S[f] = - lim lim VLi(ri,n,t)At 

= - / L^irt,rt,t)dt. (25) 

Because of tlie formal connection of S[f] to the physical laws through the mini- 
mization procedure we shall call Li the Lagrangian and S[f] the corresponding 
action for a single free particle. Following the argument by Landau and Lif- 



shitz (1TT|, §3,4), we note that the homogeneity and isotropy of space and 
homogeneity of time in an inertial reference frame imply that the Lagrangian 
Li can only depend on the absolute value of velocity 

Li(r,f,t)=Li(f2) + |g(6). (26) 

Here the total time derivative is introduced to emphasize that with fixed initial 
and final conditions the variational problem of minimizing S[f] over all / is 
not affected by addition of any dQ{^t)/dt to the Lagrangian Li{r,r,t). The 
Galilean relativity principle requires that in the reference frame which moves 
with infinitesimal velocity e relative to the original inertial reference frame 
the Lagrangian Li(r'^) = Li(r^ + 2r ■ e + e^) can differ from Li{r'^) only by 
the total time derivative of some function of the particle state. This implies 
that dLi/dr^ does not depend on the velocity, because the second term in the 
expansion 



L,{r") = L,{r') + ^2r ■ e + . . . (27) 



is a total time derivative only if it is a linear function of r. Writing dLi/dr^ 
m/2 and neglecting the total time derivative we have 



mr'^ 



Li(r,f,t) = ^-. (28) 

Because the variational problem of minimizing K^^^'f] with fixed initial and 
final conditions is not affected by addition of AQ{C,t)/At to the complexity 
K{rt^/^t\ft,rt,t) = Li{ft^^t,ft,t), the above equation implies Eq. ( p2|) as 
required. The standard derivation of the free particle Lagrangian Li used here 
gives equivalent results in any coordinate system: Li is always the kinetic 
energy. Likewise, our requirements on complexity are physically equivalent for 
different parameterizations {^} of the coarse-grained state space. The Galilean 
relativity principle demands (P^j), so that the choice of parameterization of the 
state space does not matter. 
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It now remains to show that the constraint on the form of the complexity 
imposed by the Gahlean relativity principle can be satisfied by a considerable 
number of reference computers. Certainly, not every reference computer W 
would satisfy this. It is however not surprising because the definition of W is 
far too general. To prove that there is a VT for which (p^) is true, imagine that 
we have found the physical dynamical law. For this particular law there is a 
corresponding computer which by default (if given zero length program) calcu- 
lates the physical value of Vt+At from any (ft, Vt, t). For such a computer, the 
minimal value of complexity K{rt+At\'f't, ft-, t) = is obtained for the physical 
law. This is achieved in the fixed reference frame To where the physical law 
was found. In order to perform computations in any given reference frame, 
we modify the computer to wait for a string of code which is appended to the 
main program. This appended code accommodates the definition of the given 
reference frame relative to the fixed J-q, and reformulates the results of the 
main program in the given frame. We choose the fixed frame J-q as the rest 
frame of our free particle, and require that the appended code has the fixed 
length of mr1/2 + c, where c is a constant. The computer would read r^ from 
the given data and perform calculations only if the appended code has the 
required length. Because c can be rather big the shortest program describing 
a dynamical law ■^f would use the unnecessary space in the appended code to 
encode as much information about '^f as possible. In fact, for simple ■^f the 
length of the shortest program will coincide with the length of the appended 
code. We therefore constructed a computer which satisfies requirement (^) for 
simple dynamical laws and, moreover, because we can choose c in a countably 
infinite number of ways, we know that there are at least a countably infi- 
nite number of computers which satisfy Eq. (0). This construction, although 
mathematically consistent, is rather artificial as the length of the appended 
code explicitely depends on the system state. This unpleasant feature disap- 
pears in the relativistic case where the appended code is not necessary. This 
completes the consideration of the case of a single nonrelativistic particle. The 
same arguments can be used to study the system of A^ noninteracting particles 
because they can be considered independently from one another. 

To study the case of N interacting particles we split the total system state space 
§ = {^t} into subspaces S* = {Q} that correspond to individual particles so 
that §^ X S^ X • ■ ■ X S^ = S. One easy way to do this is to construct a state 
space for every particle S* = {EX\ and then use the bijection B to form the 
total state space as S = {^j | ^t= {^h ■ ■ ■ ^^t)}- Using (H), equation (^TD for 
the entire system can be rewritten as 

r-At 



t=0 
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= 7 E E ^Cr^Atl^t) -V^ + E]At. (29) 

i=0 \n=l / 

where £^ is a constant of motion and, up to the additive total time derivative, 

Af-l 

V^^Y: nr^^A. : r^L ■■■, °-^A* 16) • (30) 

n=l 

We have already mentioned that any interaction between subsystems manifests 
itself in the correlation of their dynamics. This correlation is quantified by the 
mutual information and, for this reason, we will call Vn the interaction term. 
Function ( PDj ) obviously contains the Newtonian case of binary interaction 
Vn = J2j<k^Jk where Vjk stands for the interaction between particles j and 
k. In subsequent work we will present a more detailed study of Vat. Here 
we suppose that interaction between subsystems is known from experiment 
so that the interaction term is given as a function on the state space of the 
system. For simplicity we will consider the case when Vn is a function only 
of the coordinates {r^}n=i- Moreover, it would be misleading to consider 
a more general case, since velocity dependent forces appear in mechanics as 
an attempt to include friction or electro-magnetic interactions. Friction is 
essentially an effective phenomenon: that is, there is no single fundamental 
interaction which describes friction without further approximations. Electro- 
magnetic interactions are also irrelevant as they are not Newtonian. For these 
reasons, velocity independent potentials are used as a standard requirement 
for fundamental derivations in nonrelativistic mechanics (Ref. ||ll|], §5). 

The question now is to determine -f^(^"+Atl6)- During a sufficiently short 
interval of time At, the velocities of the particles can be treated as constants, 
which is analogous to the case of zero interaction. One can therefore repeat the 
arguments as in the case of Eq. (^2]) , with the reservation that the coefficient 
m can depend on time and can be different for every particle in the system. 
Physically, this would correspond to the most general case when particles' 
masses are changing with time like, for instance, in jet motion. Mathematically, 
this possibility arises because the arguments should be repeated for every short 

OOi, 

interval of time At independently. In doing so we should treat r^_^_^^ as a 
constant because it relates asymptotically constant velocities of the particles 
during different intervals At. As explained above, for sufficiently small At, we 
have: 



'kM = 7 E it ^^ - V'. + S ) Ai . (31) 

t=0 \n=l / 

Discrete formulation of Newton's second law appears as a necessary condition 
for the variational problem of minimizing K^^[^f], which for the case of one 
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particle moving in a potential gives, as shown in the Appendix , 

-Hn + 5r,/2)] = -^---. (32) 

The right hand side of this equation is a discrete variational derivative as 
defined in the Appendix . We see that the acceleration is determined by 
the force acting on the particle in the immediately preceding past. We can 
therefore conclude that the force is the cause of acceleration in the inertial 
reference frame. Taking the continuum limit |^| ^ oo followed by At — >■ we 
recover the standard formulation of Newton's second law 

Note that for the investigation of causality and related topics such as the arrow 
of time, our discrete formulation of Newton's second law is better suited than 
the standard formulation. We shall leave these topics for further research and 
proceed with important special cases where the difference between the discrete 
and the differential formulations of dynamical laws can be neglected. In these 
cases the differential form of the dynamical laws can be determined by applying 
techniques of standard variational calculus for minimizing the functional 




As = aJ I^^^Y^-Vn + E\ dt (34) 

with the fixed end points ^o and ^r- The connection between equations (Bl) 



and (pTl) suggests that, for small enough r, this functional must be positive 



definite and have a minimum for some fixed values of the constants a and E. 

Now we shall address the question of whether As is positive definite and has a 
minimum in the context of the physical meaning of E. For any function Vn = 
VAr(fj , . . . , f^) and constant values of {mn}n=i the Euler-Lagrange equations 
for As imply that 

^.f!!!4n! + vw) = o. (35) 



dt 



n=l 



which is the well known law of conservation of energy. Since E is a constant of 
motion it may only depend on fundamental constants and integrals of motion. 
In our case the energy is generally the only such integral of motion. Since 
Vn belongs to a very broad class of functions the only way to ensure that 
As is positive definite is to require that E contains V^ with the plus sign to 
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compensate —Vat in the Lagrangian. Therefore it is necessary that up to an 
insignificant constant, which can be absorbed in (p5D, we have 

E = f2^^^^ + Vr,. (36) 

n=l 

For sufficiently small r, this value of E corresponds to a positive definite As 
which has a minimum as required. Indeed, substitution of (|36D into (|3^) gives 



N 



As = a J2'^-^'^tfdt (37) 



n=l 



which for a > is a sum of essentially positive terms. The minimum should 
be found using the additional requirement (|36|) as an auxiliary condition. To 
emphasize the analogy with the relativistic case considered in the next sub- 



section, we reformulate this result in purely geometrical terms [^. The co- 
ordinate transformation from the Cartesian {xn,yn,Zn) to other generalized 
coordinates qj implies that there exists symmetric rrijk such that 

trnAKf -^^^^^^ ■ (38) 

n=l ^ ' 

Using the condition (|36D we have 

N 

[^ mr,{r^)^dt]^ = ^ 2(E - VM)m,kdqjdqk , (39) 

n=l j,k 

and therefore the symmetric matrix 

g,k = 2{E - Kjv)m,fc (40) 

can be used to define a line element of the form 

{dlf = ^gjkdqjdqk . (41) 

jk 



Using the definitions ( |4D|PT| ) and Eq. (pUD, we can rewrite ( P7| ) as 

As = a I dl , (42) 

-'Co 

where we emphasize that the sign of a must be chosen to compensate the 
sign degeneracy dl = ±^/{dly, that is to keep (|42|) a minimum principle. 



Equation (^2|) shows that the minimum principle is equivalent to the problem of 
finding a geodesic path between two fixed end-points ^o and ^r in the system's 
configuration space defined by the Riemannian metric Qjk- The metric in its 
turn was derived using only 
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• the SP , and 

• the Gahlean principle of relativity. 



4.2 Relativity 

In the previous section we demonstrated our approach using the example of a 
conservative nonrelativistic mechanical system. Our arguments can be summa- 
rized into three stages. First, we used the SP and the properties of Kolmogorov 
complexity to obtain Hamilton's principle of least action together with the 
general structure of Lagrangians. Second, we used the classical arguments by 
Landau and Lifshitz to specify the Lagrangians of individual particles. Third, 
we constructed a reference computer to check whether the second stage is con- 
sistent with our choice of Kolmogorov complexity as a measure of complexity 
for dynamical laws. 

In this section we consider the case of relativistic systems. The first stage of our 
arguments is identical to the one of the nonrelativistic case. This means that 
in this section we can start our arguments directly from the second stage, i.e. 
consider one particle cases such as a free relativistic particle and a relativistic 
particle in an external gravitational field. To do this this we will need to 
replace the Galilean principle with Einstein's principle of relativity. For the 
third stage of the argument we can use the arguments of the previous section 
as a template. 

We will see that the derivations of this section are considerably simpler and 
more natural than in the case of Newtonian mechanics. In particular, we do 
not construct an integrable and twice differentiable interpolation Li of the 
complexity function. More simplification is achieved in the construction of 
examples of reference computers which are consistent with our derivations. 
In the nonrelativistic case we required that the reference computer should 
be supplied with a description of the reference frame where the problem is 
formulated. Such a description can be supplied in the form of code which is 
appended to the main program. In the nonrelativistic case we showed that 
the appended code of fixed length ■mrf/2 + const does the job. Even though 
this is a mathematically consistent requirement, it is rather artificial that the 
length of the code depends on the system state. We will see that the analogous 
construction in the relativistic case does not require the appended code at all. 
At the end of this section we outline the possibility of geometrical formulations 
of our approach and show how one can derive the theory of particle motion in 
an external gravitational field. 

The physical state of a relativistic mechanical system is described by the same 
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set of parameters C,k = i'fk,'i"k,tk) as in the case of a Newtonian mechanical 
system. The concept of absolute time, however, is in deep contradiction with 
the Einstein principle of relativity. For this reason it is not convenient to 
choose time as the abstract parameter k in Eq. (JT^) as it was done in the case 
of Newtonian mechanics. If c is the speed of light, the Einstein principle of 
relativity for the case of homogeneous isotropic space and homogeneous time 
suggests that the quantity 

{Asf = c\Atf - {Axf - {Ayf - {Azf , (43) 

is the same in all inertial reference frames. It is convenient to choose this quan- 
tity for the parameterization in Eq. ([1^ in the same way as time was chosen 
in the case of Newtonian mechanics. Fixing the limits of parameterization 
s G [0, <,] we have, by analogy with Eq. (|2T|) , 



%['/] = 7E^fe+As|e.)A., (44) 

where the sum over s goes from to ^ — As in steps of As. 

The Einstein principle of relativity requires that the dynamical laws obtained 
by minimization of K^^ ['f] must be invariant with respect to the Lorentz trans- 
formations which relate different inertial reference frames to each other. Con- 
sidering the case of one free particle, we also have the requirement of homo- 
geneity of the space and time which requires that -ft'(^s+As|^s) cannot depend 
on the time or the coordinates of the particle, i.e. is a function only of the 
four-velocity 

, n 1 9 •^N /Act Aa; Aw Az, ,,^, 

As As As As 

The Lorentz transformations can be considered as rotations in four-dimensional 
space with the metric g^'^ = diag(l, —1, —1, —1). This means that -^'(^s+asI^s) 
cannot depend on the direction of the four- velocity. Since UjUkQ-'^ = 1, the 
absolute value of the four- velocity is a constant and therefore 

ii'(e.+A«|e.)=const + ^, (46) 

As 

where Q is an arbitrary function of the system state. This equation is a 
relativistic analogue of Eq. (0). Substitution of (|46|) into (0) gives 

?-As 

%['/] = « E ^'- (47) 

s=0 
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Choosing As = +a/(As)2 we see that K^^ has a minimum for negative a. In 
the usual case, when dynamical laws can be interpolated by twice differentiable 
functions, the above equation becomes 

ircl 



Af = a ds + const . (48) 

Jo 

Minimization of this quantity over possible dynamical laws gives the known 
equations of motion for a free relativistic particle. In other words, the SP 
combined with the Einstein principle of relativity is enough to obtain the 
Lagrangian of a free relativistic particle. 

As in the case of Eq. (|2^), we must show that there exists a considerable num- 
ber of reference computers which satisfy requirement (|46|). This can be shown 
by repeating the arguments of the previous section. As in the nonrelativistic 
case we obtain a family of countably infinitely many computers which satisfy 
requirement (^61) . This time, however, the arguments can be simplified since 
the left-hand-side of (^) does not contain terms quadratic in velocity. More- 



over, the constant term in (^61) can be absorbed into AQ/As which means 
that the appended code, artificially required in the Newtonian mechanics, is 
not necessary in the relativistic case. 

It remains to show that A^g^ can be made positive definite. Writing the constant 
as acEr, where E is an integral of motion and r is the time elapsed between 
the boundary events ,^o and ^^ we have 



Af = ac {Jl- r^/c^ + E) dt . (49) 

This equation is a relativistic analogue of Eq. (0) in the case of one particle. 
As in Newtonian mechanics, it is easy to see that A^g" is positive definite when 
E is equal to the energy of the system. Indeed, for small r^ equation (^) 
becomes 



Af = I {-mc' +'-^^ + ... + E) dt, (50) 

Jo 2 

where m = \a\/c. For small r the relativistic energy of a particle is (mc^ + 
mf^/2 + ...); therefore E contains -|-mc^ compensating the negative term in 
(pop, and Af is positive as required. The action remains positive definite for 
any values of r because the relativistic energy grows monotonically with r^. 

Looking at equations (|48| ) and (0), we see that the problem of identifying a 
predominant dynamical law is equivalent to the problem of finding a geodesic 
path between two fixed end-points in the system configuration space. The 
metric of the configuration space is again determined only by the SP com- 
bined with the Galilean or Einstein's principles of relativity. The case of a 
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particle in an external gravitational field trivially fits this scheme [^. Ein- 
stein's principle of equivalence requires that an external gravitational field can 
be introduced as an appropriate change in the metric of space-time, that is as 
a change in the expression of ds in terms of dx, dy, dz and dt. Equation (^) 
has the same form for all such expressions, and the requirement of minimum 
of A^g^ gives the standard equations of motion for a particle in an external 
gravitational field [0. 



5 Discussion 

We introduced a new physical principle - the Simplicity Principle. It is based 
on the classical principle of Occam's Razor which is a cornerstone of the mod- 
ern theory of induction and machine learning. Using the Simplicity Principle, 
we explained the general structure of the Lagrangian for a composite physical 
system. In fact, we explained all generic postulates of the Lagrangian formula- 
tion of physical dynamics. We demonstrated our approach using the examples 
of Newtonian mechanics, relativistic mechanics and the motion of a relativistic 
particle in an external gravitational field. We thereby establish a non-trivial 
link between the Simplicity Principle and the principle of stationary action. 

We have already mentioned that singling out the simplest hypothesis is not the 
optimal strategy of inductive inference. Ideally, we should consider all possi- 
ble dynamical laws {/} weighted in accordance with their complexity 2~^^[-^l. 
This is reminiscent of the Feynman path integrals approach to quantization. At 
present, in quantum field theory a typical derivation of Feynman path integrals 
from first principles cannot be considered mathematically rigorous [|r^ . There 
are problems with convergence and with analytic continuation from Minkowski 
space to Euclidean space. Using Kolmogorov complexity instead of Euclidean 
action would improve the convergence while preserving all results that can 
be attributed to the contributions of simple laws. Indeed, as suggested by the 
Kraft inequality, sums of the type ^ r 2~^"'['^] are convergent: this is not always 
the case with Feynman path integrals. Moreover, there is some independent 
evidence [|TB| that at least a qualitative relationship between the Euclidean ac- 



tion and Kolmogorov complexity should exist. Using Kolmogorov complexity 
instead of the Euclidean action may also be useful for quantum gravity P^ 



where, among others, the indefiniteness of the gravitational action is a serious 
problem PDI. These and other applications of the proposed approach are a 



matter for further research. 
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Appendix 

To a large extent, the standard derivation of the Euler-Lagrange equations 
is based on the well-known method of integration by parts. Here we briefly 
review its discrete analogue - Abel's method of "summation by parts". We 
then apply this method to derive (0). 

Suramation by parts 

Let 

J J 

Uj = J^Ufe and Vj = ^v^ , (51) 

k=0 fc=0 

where we adopt the usual convention that if 6 < a then J2k=a ^W — ^^^ ^^J 
function F. The integration by parts method can most easily be demonstrated 
as a consequence of the Leibnitz rule 

d{m) = Urf(V) + Vrf(U) . (52) 

By analogy we therefore compute 

UfcVfc - Ufe_iVfe_i = Ufe(Vfe-Vfc_i)+Vfe„i(Ufe-Ufe_i) 

= UfeVfc + Vfc_iUfc . (53) 

Summation from k = to k = J gives 

J J 

fc=0 fc=l 

This "summation by parts" formula can be used for manipulating discrete 
sums just like the integration by parts is used to manipulate integrals. In 
particular, we can derive Eq. (|3^) as a discrete analogue of the Euler-Lagrange 
equations as follows. In Eqs. (|5T| , |5^ we replace the abstract summation index 
k with the time t, as explained for Eqs. ([T9|) and (^T]) , to get (using the earlier 
defined notation) 

J T 

Uj = ^Ufe = ^Ut = U, , (55) 

fc=0 t=o 

where uj = Ui — \Jt-At = AU^. Writing analogous relations for Vt and v^, we 
can rewrite Eq. (|54D as 

r r 

UrVr = Yl U*^^* + Y. Vt-At AUi . (56) 

t=0 t=At 
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One particle in a potential 

The case of one particle moving in a potential V{rt) is built upon the approxi- 
mation that the particle interacts with a massive system which determines the 
potential, and whose dynamics are not sensitive to the particle motion. Math- 
ematically, such an approximation is performed as follows. The complexity of 
the dynamical law "^f for the whole system has the form 

T-At 

Kd'f] « E ^Crt-^At, rU> • • • rZAMt)At , (57) 

t=o 

where the upper indices from 1 to A^ refer to the particles of the massive 
system. Formally separating our particle from the massive system we have 
from (|31|) 



^ •- m„, (r?)^ 



K{rt+At, r-j^At, • • • rt+At\^t) = — Vn+i - }_^ 



+ const 



m[jrt] 
2 

n=l 

(58) 

To make the above approximation we assume that the equations of motion 
for the massive system are known and are not affected by the motion of our 
particle. This means that variables {r"^} can be eliminated and the expression 
in square brackets can be replaced by an effective interaction V{ft)\ 



m (Vi 



\2 



K{rt+Au r-t+At. • • • ^t+Atl6) ~ K{rt+At\rt, n, t) = V{rt) + const . 

(59) 

Although useful in practice, this approximation ruins the connection between 
the interaction and the mutual information. Effective interaction V{rt) has 
contributions from the kinetic energy terms and it strongly depends on the 
assumed equations of motion. Thus, the information-theoretic interpretation 
of the interaction terms in the Lagrangian is only valid for the fundamental 
interactions: before any approximations are made. 

O 

For simplicity, we shall consider the case of one dimensional motion, where Vt 
and ft can be considered as scalars. Generalization to the multidimensional 
case is essentially trivial. We define the discrete variation in absolute analogy 
with standard variational calculus 

6Vift) = Vift + 5ft) - Vift) , (60) 

where Sft is a virtual change of the function ft- For instance if T{rt) = 
m {rtY /2 then 

mn) = "^[{rt + Srt)' - {rt)'] = mrtSn + m (5r,)V2 • (61) 
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To minimize the sum (0) we require that, up to second order in 6rt and 6rt, 

T-At T-At 



t=0 



5J2 \nrt)-V{ft) At^J2 \ST{rt)-SV{ft 



t=o 



At = . 



(62) 



Because we do not vary the functions Vf and rj at the end points t = and 
t = T, we have 6T{rT-) = and therefore ( p^ is equivalent to 



5^(5T(n)At - J2 <5^(r"t-At)At = . 



(63) 



t=o 



t=At 



Noticing that 



we have 



5rt 



5rt - 5rt- 



At 



At 



A6ft 
At 



±ST{r,)At = ±^-:^A6f,, 
t=o t=o ort 



(64) 



(65) 



where it is understood that 6T{rr)/Srr = as required by the transition from 
Eq. (H) to Eq. (H). Setting Ut = 6T{rt)/Srt and Vj = 5rt for all t, we use 



summation by parts (56) to show 



E^A^^. 



OVt 



t=0 ^^i t=At 

Combining Eqs. ( |63|) and (|6^) , we have, up to second order in 6ft, 



(66) 



E 



t=At 



A 6T{rt) ^ 6V{ft.At) 



6ft-At At = 0. 



At s^t Srt-At 

This must be true for arbitrary values of Sft^^t and therefore we demand 

A 6T{rt) , 6Vift_At 



(67) 



At sk 



+ 



6r 



0. 



t~At 



(68) 



Now substituting T{rt) = m {vtY /2 and using Eq. (pT]) we have (P^ as re- 
quired. 
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